1. Project Objectives

The objective of this project is to forecast demand for lettuce for four individual restaurants owned by a large fast-food retail chain in the US that makes submarine sandwiches (subs), in order to enable managers make inventory replenshipment decisions. Towards this goal, the ARIMA and Holt-Winters models are employed and evaluated in order to forecast demand in the two-week window from 06/16/2015 to 06/29/2015. The data in the 06/16/2015 to 06/29/2015 window are unseen and the performance of the models will be judged based on Mean Squared Deviation.

2. Data Preprocessing

The data is contained in 11 different datasets, ranging from POS transactional data, to recipes, portion amounts and ingredients:

These datasets are read in:

ingredients<-read.csv(file="/Users/arma/Desktop/Data/ingredients.csv", header= TRUE, sep=",")
menu_items<-read.csv(file="/Users/arma/Desktop/Data/menu_items.csv", header= TRUE, sep=",")
menuitem<-read.csv(file="/Users/arma/Desktop/Data/menuitem.csv", header= TRUE, sep=",")
portion_uom_types<-read.csv(file="/Users/arma/Desktop/Data/portion_uom_types.csv", header= TRUE, sep=",")
pos_ordersale<-read.csv(file="/Users/arma/Desktop/Data/pos_ordersale.csv", header= TRUE, sep=",")
recipe_ingredient_assignments<-read.csv(file="/Users/arma/Desktop/Data/recipe_ingredient_assignments.csv", header= TRUE, sep=",")
recipe_sub_recipe_assignments<-read.csv(file="/Users/arma/Desktop/Data/recipe_sub_recipe_assignments.csv", header= TRUE, sep=",")
recipes<-read.csv(file="/Users/arma/Desktop/Data/recipes.csv", header= TRUE, sep=",")
sub_recipe_ingr_assignments<-read.csv(file="/Users/arma/Desktop/Data/sub_recipe_ingr_assignments.csv", header= TRUE, sep=",")
sub_recipes<-read.csv(file="/Users/arma/Desktop/Data/sub_recipes.csv", header= TRUE, sep=",")
library(readxl)
store_restaurant<-read_excel("/Users/arma/Desktop/Data/store_restaurant.xlsx", col_names = TRUE)

The aim of this section is to calculate the total quantity of lettuce demanded each day for each store, this is done as follows:

  1. Find all the recipes with lettuce (lettuce IngredientId=27) included
lettuce_recipes<-recipe_ingredient_assignments[recipe_ingredient_assignments$IngredientId==27,]
  1. Since lettuce can also enter the recipes via the subs, we must also account for this
subs_with_lettuce<-sub_recipe_ingr_assignments[sub_recipe_ingr_assignments$IngredientId==27,] #Lettuce only
subs_lettuce_recipes<- inner_join(subs_with_lettuce,recipe_sub_recipe_assignments, by="SubRecipeId") #Inorder to get RecipeId and Factor 
subs_lettuce_recipes<- subs_lettuce_recipes %>% mutate(lettuce_weight=Quantity*Factor) #Total amount of lettuce in each sub recipe is given by factor * quantity 
aggregated_subs_lettuce<- subs_lettuce_recipes %>% group_by(RecipeId,IngredientId) %>% summarise(lettuce_used=sum(lettuce_weight)) #Sums up lettuce used for each RecipeId
  1. Combine lettuce_recipes and aggregated_subs_lettuce
colnames(aggregated_subs_lettuce)[3]<-"Quantity" #Rename so as to match the titles in lettuce_recipes
lettuce_recipes<- as.data.frame(lettuce_recipes)
aggregated_subs_lettuce<- as.data.frame(aggregated_subs_lettuce)
all_lettuce_recipes<-rbind(lettuce_recipes, aggregated_subs_lettuce)
total_lettuce_recipes<- all_lettuce_recipes %>% group_by(RecipeId) %>%  summarise(lettuce_quantity=sum(Quantity))
  1. Find out how much is sold each day in the shops
menu<-inner_join(menuitem, menu_items, by = c("Id" = "MenuItemId")) #Inorder to get Quantity Purchased
lettuce_menu_item<-inner_join(menu, total_lettuce_recipes, by="RecipeId")
lettuce_menu_item$lettuce_demand<- lettuce_menu_item$Quantity * lettuce_menu_item$lettuce_quantity
lettuce_menu_item$date<-as.Date(lettuce_menu_item$date, format="%y-%m-%d") #Make date column a date object
  1. Split into separate store locations and add up orders
cali1<- lettuce_menu_item[lettuce_menu_item$StoreNumber==46673,]
cali2<- lettuce_menu_item[lettuce_menu_item$StoreNumber==4904,]
ny1<- lettuce_menu_item[lettuce_menu_item$StoreNumber==12631,]
ny2<- lettuce_menu_item[lettuce_menu_item$StoreNumber==20974,]

#Aggregate lettuce demanded
cali1<- cali1 %>% arrange(date) %>% group_by(date) %>% summarise(aggregate_weight=sum(lettuce_demand))
cali2<- cali2 %>% arrange(date) %>% group_by(date) %>%  summarise(aggregate_weight=sum(lettuce_demand))
ny1<- ny1 %>% arrange(date) %>% group_by(date) %>% summarise(aggregate_weight=sum(lettuce_demand))
ny2<- ny2 %>% arrange(date) %>% group_by(date) %>% summarise(aggregate_weight=sum(lettuce_demand))

Upon inspecting NY2, it can be seen that there are time gaps in the first 6 rows. The non consecutive days would make fitting a time series model problematic therefore, these six rows are removed.

ny2<-ny2[-c(1:6),]

3. ARIMA Models

This section begins with a qualitative assessment of the time series plots seen for each of the shops, by looking for evidence of seasonality, trend, cyclicity and stationarity.

p1<- cali1 %>% ggplot(aes(x=date,y=aggregate_weight)) + geom_line() + ggtitle("Cali1") +ylab("Lettuce Demand")
p2<- cali2 %>% ggplot(aes(x=date,y=aggregate_weight)) + geom_line() + ggtitle("Cali2") +ylab("Lettuce demand")
p3<- ny1 %>% ggplot(aes(x=date,y=aggregate_weight)) + geom_line() + ggtitle("NY1") +ylab("Lettuce demand")
p4<- ny2 %>% ggplot(aes(x=date,y=aggregate_weight)) + geom_line() + ggtitle("NY2") +ylab("Lettuce demand")
grid.arrange(p1, p2, p3, p4, nrow = 2)

1. Cali1 Store: Given that the data available spans March to June 2015, there is not enough data to determine if there is infact cyclicity and this will subsequently be ignored. Additionally, there appears to be little to no evidence of trend in the plot of Cali1 (no overall increase). However, within the months, there is evidence of weekly seasonality (frequency of 7 days). Finally, the mean and the variance appear to be roughly constant, implying trend stationarity.

2. Cali2 Store: Similar to the Cali1 Store above, Cali2 exhibits no observable trend and there appears to be weekly seasonality also. As above, mean and variance appear to be roughly constant, implying trend stationarity.

3. NY1: By contrast, the mean and variance for NY1 appear to not be constant and consequently appears non-stationary. Additionally, seasonality is also evident.

4. NY2: In NY2, mean and variance are roughly constant, indicating trend stationarity.

For fitting the ARIMA models, the Box-Jenkins procedure is followed:

  1. Identification: Pre-processing of the data to make it stationary - stationarity is observed by the autocorrelations decaying to zero exponentially. Once the timeseries is stationary, an ARIMA model is fit using indicative parameters determined by ACF and PACF.

  2. Estimation: The parameters are decided based on information criteria (AIC/BIC).

  3. Verification: Model fit to the data is determined by performing residual analyses - visual inspection of the distribution of residuals as well as formal tests such as Ljung-Box.

3.1 Cali1 Store

3.11 Stationarize The Time Series

Train/Test Split: At the start of this analysis, we split the data into a train and test sets. The Cali1 dataset has 103 observations and subsequent analyses are performed with an 80/20 train-test split. Consequently, the models are trained with data from the 5th of March 2015 to the 25th of May 2015 (1st observation to the 82nd). As noted above, given that the data are sampled daily, the time series will be formulated with a frequency of 7.

cali1_train<-cali1[1:82,] #Train on the first 82 observations
cali1_train<-ts(cali1_train[,2],frequency = 7, start=c(03,05)) #Make ts object
cali1_test<-cali1[83:dim(cali1)[1],] #Test on from 83rd to last
cali1_test<-ts(cali1_test[,2],frequency = 7, start=c(15,3)) #Make ts object

The plots of the timeseries, ACF and PACF are observed:

ggtsdisplay(cali1_train) #Look at timeseries, ACF, PACF plots for Cali1 Store

As seen from the ACF plot above, there are spikes occuring at lags that are a multiple of 7 (7, 14 and 21), reinforcing the earlier conclusion that seasonality occurs with a period of 7 days.

In section 3 above, it was assumed that the Cali1 timeseries was trend stationary. This is verified by utilising the ndiffs() function in order to determine the number of first-order differences required to stationarise the timeseries - as expected, this returns 0.

 ndiffs(cali1_train) #Number of first order differences required to stationarise the time series. 
## [1] 0

From observing the timeseries above, one observes that the magnitude of the seasonal fluctuations does not vary with the level of the time series. Therefore, it is assumed that the seasonal component is additive and in order to find the nsdiffs() function is used.

 nsdiffs(cali1_train) #Number of seasonal differences required to stationarise the time series.
## [1] 1

As seen above one seasonal difference needs to be taken.

cali1_ts_diff<- diff(cali1_train, lag = 7) #One seasonal difference

This new differenced object is visually observed:

autoplot(cali1_ts_diff) + labs(y="Differenced Lettuce Demand for Cali 1")

As seen above, the plot appears stationary with the season component removed. Stationarity is formally tested by applying the Augemented Dickey-Fuller (ADF), Phillips-Perron (PP) and Kwiatkowski-Phillips-Schmidt-Shin (KPSS) tests. The ADF and PP test the null hypothesis that there is a unit root in the time series i.e. non-stationary. Conversely, the KPSS test the null hypotehsis that the timeseries is trend stationary.

adf.test(cali1_ts_diff) #ADF Test
## Warning in adf.test(cali1_ts_diff): p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  cali1_ts_diff
## Dickey-Fuller = -4.1898, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
pp.test(cali1_ts_diff) #PP Test
## Warning in pp.test(cali1_ts_diff): p-value smaller than printed p-value
## 
##  Phillips-Perron Unit Root Test
## 
## data:  cali1_ts_diff
## Dickey-Fuller Z(alpha) = -70.036, Truncation lag parameter = 3, p-value
## = 0.01
## alternative hypothesis: stationary
kpss.test(cali1_ts_diff) #KPSS Test
## Warning in kpss.test(cali1_ts_diff): p-value greater than printed p-value
## 
##  KPSS Test for Level Stationarity
## 
## data:  cali1_ts_diff
## KPSS Level = 0.21189, Truncation lag parameter = 3, p-value = 0.1

As seen, the null hypothesis is rejected for the ADF and PP test, while the null cannot be rejected for the KPPS at a significance level of 5%. Thus, it is concluded that the timeseries is adequately stationarised.

3.12 Parameter Estimation

With the now stationarised timeseries, the next step in the analysis is to determine the orders of the MA and AR components based on ACF and PACF. The PACF is expected to give an indication of the order of the AR component, while the ACF is expected to give that of the MA component. As no trend first order differences were required potential models will be of the form ARIMA(p,0,q)(P,1,Q)(7)

cali1_acf<-acf(cali1_ts_diff, lag.max = 21, plot= FALSE) #ACF of Differenced Data
plot(cali1_acf, main= "Cali 1 Store ACF")

cali1_pacf<-pacf(cali1_ts_diff,lag.max = 21, plot=FALSE) #PACF of Differenced Data
plot(cali1_pacf, main= "Cali 1 Store PACF")

In the ACF plot, it can be seen that the only significant lag is at 7. In the PACF plot, beyond the significant lag at 7, an exponential decay pattern can be seen in subsequent seasonal lags. Of the ~ 20 lags shown in the plots, with a confidence interval of 95%, one would expect 1 of the spikes in both plots to be erroneously significant. It is for this reason that the “slightly” significant lags (lags 11 in the ACF plot and 15 in the PACF plot) are ignored. Consequently, P and Q are expected to be less than or equal to 1 - providing support for a candidate model such as: ARIMA(0,0,0)(1,1,1)[7].

The auto.arima function is then used in order to select the optimal orders based on the information criteria.

auto.arima(cali1_train, trace = FALSE, approximation = FALSE, stepwise = FALSE, ic = c("aicc", "aic", "bic")) 
## Series: cali1_train 
## ARIMA(1,0,0)(0,1,1)[7] 
## 
## Coefficients:
##          ar1     sma1
##       0.1793  -0.6995
## s.e.  0.1187   0.1229
## 
## sigma^2 estimated as 699.8:  log likelihood=-353.43
## AIC=712.85   AICc=713.19   BIC=719.81

The “best” model is given by ARIMA(1,0,0)(0,1,1)[7]. The seasonal portion is easily reconciled with the anlysis above - as there is a spike in the ACF at lag 7 and exponential decay patterns in the PACF seasonal lags. However, the surprise comes from the p=1 component as one would expect to see significant spikes in the early lags of the PACF in order to support this. Nonetheless, this model is selected as a candidate model to be tested further.

The above results, based on the information criteria, are summarised as follows:

  1. Best Model: ARIMA(1,0,0)(0,1,1)[7]

  2. Second Best Model: ARIMA(0,0,1)(0,1,1)[7]

Since the model expected from observing the ACF and PACF plots is amongst the top performers, it is also selected as a candidate model.

  1. Model Expected from PACF/ACF Plots: ARIMA(0,0,0)(1,1,1)[7]
#Coding the three candidate models
cali.m1<-Arima(cali1_train, order = c(1,0,0), seasonal = list(order = c(0, 1, 1), period=7))
cali1.m2<-Arima(cali1_train, order = c(0,0,1), seasonal = list(order = c(0, 1, 1), period=7))
cali1.m3<-Arima(cali1_train, order = c(0,0,0), seasonal = list(order = c(1, 1, 1), period=7))

3.13 Verification

In Sample Performance

This section begins with the an evalution of the in-sample performance of the models above:

cali1_in_sample<-as.data.frame(accuracy(cali.m1))
cali1_in_sample<-rbind(cali1_in_sample,accuracy(cali1.m2))
cali1_in_sample<-rbind(cali1_in_sample, accuracy(cali1.m3))
cali1_in_sample$Model<- c("Model 1", "Model 2", "Model 3")
cali1_in_sample<- cali1_in_sample %>% select(Model, everything())
row.names(cali1_in_sample) <- NULL
cali1_in_sample %>% kable(caption = "In Sample Performance") %>%  kable_styling("striped")
In Sample Performance
Model ME RMSE MAE MPE MAPE MASE ACF1
Model 1 -0.7949027 24.95910 18.28022 -2.980306 13.47389 0.6903407 0.0052265
Model 2 -0.8097159 24.98450 18.30272 -2.989476 13.52350 0.6911902 0.0072108
Model 3 -1.2879451 25.54684 18.88651 -3.500201 14.10580 0.7132369 0.1701267

From the results of the table above, it is seen that Model 1 perfoms the best on all metrics. This outcome is not surprising as this was the model selected by a range of information criteria, including BIC. These criteria, especially the BIC, select the models based on a balance between fit and parsimony - consequently, one would expect the best model selected by this procedure to exhibit the best performance.

Nonetheless, the out of sample performance of these three models will be tested in order to evaluate how well they generalise on unseen data.

Out of Sample Performance

When considering the evaluation of out-of-sample performance of ARIMA models, there are two approaches: one based on a one-step ahead forecast and another based on multi-step ahead forecast. In reality, as lettuce is considered fresh produce and there are likely to be many producers, one could argue that fast-food chains are unlikely to place orders for lettuce too far in advance - this is reflective of a supply chain that is likely to be quite responsive. However, as the output of this report is a forecast for the next 14 days from the end of the available data, the model performance on the multistep ahead forecast will be prioritised.

#Out of sample performance 
cali1_out_sample<-as.data.frame(accuracy(forecast(cali.m1, h = 20), cali1_test))
cali1_out_sample<-rbind(cali1_out_sample,accuracy(forecast(cali1.m2, h = 20), cali1_test))
cali1_out_sample<-rbind(cali1_out_sample,accuracy(forecast(cali1.m3, h = 20), cali1_test))
cali1_out_sample$Model<- c("Model 1","Model 1", "Model 2", "Model 2","Model 3","Model 3")
cali1_out_sample<- cali1_out_sample %>% select(Model, everything())
cali1_out_sample %>% kable(caption = "Out of Sample Performance") %>%  kable_styling("striped")
Out of Sample Performance
Model ME RMSE MAE MPE MAPE MASE ACF1 Theil’s U
Training set Model 1 -0.7949027 24.95910 18.28022 -2.980306 13.47389 0.6903407 0.0052265 NA
Test set Model 1 7.6579755 32.69719 23.53935 1.828789 17.15170 0.8889481 0.1942585 0.4717754
Training set1 Model 2 -0.8097159 24.98450 18.30272 -2.989476 13.52350 0.6911902 0.0072108 NA
Test set1 Model 2 7.5579040 32.87697 23.56221 1.772784 17.17768 0.8898115 0.1947272 0.4746790
Training set2 Model 3 -1.2879451 25.54684 18.88651 -3.500201 14.10580 0.7132369 0.1701267 NA
Test set2 Model 3 11.1635471 36.15826 25.96754 4.386626 18.53260 0.9806472 0.1981617 0.5252240
#Plot Forecasts
cali1_m1_plot<-autoplot(forecast(cali.m1, h = 20))
cali1_m2_plot<-autoplot(forecast(cali1.m2, h = 20))
cali1_m3_plot<-autoplot(forecast(cali1.m3, h = 20))
grid.arrange(cali1_m1_plot, cali1_m2_plot, cali1_m3_plot, nrow = 2)

As expected, the perfomance of all the models is worse on the test set than on the training set. As seen, model 1 continues to perform the best across most of the evaluation Metrics. However, special attention is paid to the RMSE due to its sensitivity to large errors. For this reason, Model 1 is selected over the other models evaluated.

However, before Model 1 is formally selected and used to make predictions for the subsequent 14 days, its residuals are checked:

#Check Residuals
checkresiduals(cali.m1)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,0,0)(0,1,1)[7]
## Q* = 15.376, df = 12, p-value = 0.2215
## 
## Model df: 2.   Total lags used: 14

Ideally, there should be no autocorrelations in the residuals and this is evidenced by “white-noise” pattern seen in the time series, no significant spikes in the ACF plot and normally distributed residuals with mean 0. Further more, these are supported by the Ljung-Box test, with the null hypothesis that the model does not show a lack of fit.

The timeseries of the residuals above generally exhibits a white noise like pattern, although there appears to be a slight downwards trajectory in the later lags. There are two significant lags in the ACF plot. On account of random variations and their small value above the threshold, these are deemed acceptable. Finally, the Ljung-Box test returns a p-value of 0.2215, indicating that null hypothesis cannot be rejected. Model 1 is therefore considered satisfactory and will be compared against the Holt-Winters model in order to decide on which should be used for the final prediction.

3.2 Cali2 Store

3.21 Stationarize The Time Series

Train/Test Split: As before, the dataset is split into a training and test set. There are 95 observations in the Cali2 dataset and subsequent analyses are performed with an 80/20 train-test split. The models are trained on data from the 13th of March 2015 to the 27th of May 2015 (1st observation to the 76th).

cali2_train<-cali2[1:76,] #Train on the first 76 observations
cali2_train<-ts(cali2_train[,2],frequency = 7, start=c(03,13)) #Make ts object
cali2_test<-cali2[77:dim(cali2)[1],] #Test on from 77th to last
cali2_test<-ts(cali2_test[,2],frequency = 7, start=c(15,5)) #Make ts object

The plots of the timeseries, ACF and PACF are observed:

ggtsdisplay(cali2_train) #Look at timeseries, ACF, PACF plots for Cali2 Store

As expected, in the ACF plot there are significant spikes occuring at lag 7 and multiples of 7 (14 and 21), confirming the seasonality.

The trend stationarity of the plot above is confirmed by using the ndiffs() function- as expected, this returns 0.

ndiffs(cali2_train) #Number of first order differences required to stationarise the time series.
## [1] 0
nsdiffs(cali2_train)
## [1] 1

As seen above, one seaonal difference needs to be taken.

cali2_diff<-diff(cali2_train,lag=7)

This new differenced object is visually observed:

autoplot(cali2_diff) + labs(y="Differenced Lettuce Demand for Cali 2")

adf.test(cali2_diff) 
## 
##  Augmented Dickey-Fuller Test
## 
## data:  cali2_diff
## Dickey-Fuller = -2.8934, Lag order = 4, p-value = 0.2121
## alternative hypothesis: stationary
pp.test(cali2_diff) 
## Warning in pp.test(cali2_diff): p-value smaller than printed p-value
## 
##  Phillips-Perron Unit Root Test
## 
## data:  cali2_diff
## Dickey-Fuller Z(alpha) = -68.18, Truncation lag parameter = 3, p-value
## = 0.01
## alternative hypothesis: stationary
kpss.test(cali2_diff) 
## 
##  KPSS Test for Level Stationarity
## 
## data:  cali2_diff
## KPSS Level = 0.50699, Truncation lag parameter = 3, p-value = 0.04009

The conclusion derived from the ADF and KPSS test is that the time series is still not stationary. However, the PP test suggests that the time series is in fact stationay, as the null hypothesis is rejected with a p-value of 0.01. Upon observing the differenced time series plot above, it is seen that there appears to be a slight downwards trend, supporting the case for a trend difference to be taken.

#Take a trend difference on the already seasonally differenced cali2_diff. How many differences to take?
ndiffs(cali2_diff)
## [1] 1
#Take a trend difference on the already seasonally differenced cali2_diff
cali2_diff_2<-diff(cali2_diff, differences=1)

The ADF, PP and KPSS tests are then performed again to ensure stationarity.

#ADF Test
adf.test(cali2_diff_2)
## Warning in adf.test(cali2_diff_2): p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  cali2_diff_2
## Dickey-Fuller = -5.1688, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
#PP Test
pp.test(cali2_diff_2)
## Warning in pp.test(cali2_diff_2): p-value smaller than printed p-value
## 
##  Phillips-Perron Unit Root Test
## 
## data:  cali2_diff_2
## Dickey-Fuller Z(alpha) = -93.446, Truncation lag parameter = 3, p-value
## = 0.01
## alternative hypothesis: stationary
#PP Test
kpss.test(cali2_diff_2)
## Warning in kpss.test(cali2_diff_2): p-value greater than printed p-value
## 
##  KPSS Test for Level Stationarity
## 
## data:  cali2_diff_2
## KPSS Level = 0.075925, Truncation lag parameter = 3, p-value = 0.1

As the stationarity outcome is consistent from all three tests in the case of the doubly-differenced data (cali2_diff_2), further analysis on ACF/PACF is performed.

3.22 Parameter Estimation

#ACF
cali2_acf<-acf(cali2_diff_2, lag.max=21, plot=FALSE)
plot(cali2_acf, main= "Cali 2 Store ACF")

#PACF
cali2_pacf<-pacf(cali2_diff_2, lag.max=21, plot=FALSE)
plot(cali2_pacf, main= "Cali 2 Store PACF")

From the ACF plot, there are significant spikes at lag 1 and lag 7. This suggests a non-seasonal MA(1) component, as well as a seasonal MA(1) component. Similarly, the significant spike at lag 1 of the PACF is suggestive of an AR(1). None of the spikes at the seasonal lags in the PACF plot is significant.

The auto.arima function is run in order to return the optimal model order, based on the information criteria.

auto.arima(cali2_train, trace = FALSE, approximation = FALSE, stepwise = FALSE, ic = c("aicc", "aic", "bic")) 
## Series: cali2_train 
## ARIMA(0,1,1)(0,1,1)[7] 
## 
## Coefficients:
##           ma1     sma1
##       -0.8413  -0.5869
## s.e.   0.0720   0.1838
## 
## sigma^2 estimated as 2248:  log likelihood=-360.16
## AIC=726.31   AICc=726.69   BIC=732.97

The best model is given by ARIMA(0,1,1)(0,1,1)[7] and the second best model is given by ARIMA(0,1,1)(1,1,1)[7]. The outcome with the analysis above - given the significant spike in the MA plot at lag 1, an MA(1) component is not surprising. Additionally, the seasonal MA(1) component is explained by the significant spike at lag 7 in the ACF.

The “best” model and the second best model (given by ARIMA(0,1,1)(1,1,1)[7]) are evaluated based on their in-sample and out of sample performance.

#Coding the two candidate models
cali2.m1<-Arima(cali2_train, order = c(0,1,1), seasonal = list(order = c(0, 1, 1), period=7))
cali2.m2<-Arima(cali1_train, order = c(0,1,1), seasonal = list(order = c(1, 1, 1), period=7))

3.23 Verification

In Sample Performance:

cali2_in_sample<-as.data.frame(accuracy(cali2.m1))
cali2_in_sample<-rbind(cali2_in_sample,accuracy(cali2.m2))
cali2_in_sample$Model<- c("Model 1", "Model 2")
cali2_in_sample<- cali2_in_sample %>% select(Model, everything())
row.names(cali2_in_sample) <- NULL
cali2_in_sample %>% kable(caption = "In Sample Performance") %>%  kable_styling("striped")
In Sample Performance
Model ME RMSE MAE MPE MAPE MASE ACF1
Model 1 -7.638800 44.18003 31.90097 -3.637262 10.72865 0.7443920 -0.1074127
Model 2 -2.973974 24.45665 18.77466 -4.563226 14.41179 0.7090127 0.0736023

The results are a mixed bag, however, Model 2 outperforms Model 1 on RMSE. The out of sample performance of these two models is also assesed.

Out of Sample Performance:

#Out of sample performance 
cali2_out_sample<-as.data.frame(accuracy(forecast(cali2.m1, h = 19), cali2_test))
cali2_out_sample<-rbind(cali2_out_sample,accuracy(forecast(cali2.m2, h = 19), cali2_test))
cali2_out_sample$Model<- c("Model 1","Model 1", "Model 2", "Model 2")
cali2_out_sample<- cali2_out_sample %>% select(Model, everything())
cali2_out_sample %>% kable(caption = "Out of Sample Performance") %>%  kable_styling("striped")
Out of Sample Performance
Model ME RMSE MAE MPE MAPE MASE ACF1 Theil’s U
Training set Model 1 -7.638800 44.18003 31.90097 -3.637262 10.72865 0.7443920 -0.1074127 NA
Test set Model 1 71.572809 94.50028 74.24301 21.922535 23.10160 1.7324206 -0.3237485 1.059464
Training set1 Model 2 -2.973974 24.45665 18.77466 -4.563226 14.41179 0.7090127 0.0736023 NA
Test set1 Model 2 194.415590 200.60710 194.41559 65.133028 65.13303 7.3419785 0.0662369 2.019972
#Plot Forecasts
cali2_m1_plot<-autoplot(forecast(cali2.m1, h = 19))
cali2_m2_plot<-autoplot(forecast(cali2.m2, h = 19))
grid.arrange(cali2_m1_plot, cali2_m2_plot, nrow = 1)

As Model 1 outperforms Model 2 on the out of sample test, it is selected. Finally, its residuals are checked.

checkresiduals(cali2.m1)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,1)(0,1,1)[7]
## Q* = 16.008, df = 12, p-value = 0.1909
## 
## Model df: 2.   Total lags used: 14

Model 1 provides a satisfactory fit for the data: the time series plot of the residuals displays white noise patterns and the residuals are normally distributed with mean zero. The spike at lag 8 crosses the threshold. With about 20 lags, This can be attributed to random errors as ~1 of the spikes is expected to be erroneously classified. Finally, this is supported by the large p value in the Ljung-Box test of about 0.2.

3.3 NY1 Store

3.31 Stationarize The Time Series

Train/Test Split: The 103 observations are split into a training/test set using the 80-20 split.

ny1_train<-ny1[1:82,] #Train on first 82 observations
ny1_train<-ts(ny1_train[,2],frequency = 7, start=c(03,05)) #Make ts object
ny1_test<-ny1[83:dim(ny1)[1],] #Test on from 83rd to last
ny1_test<-ts(ny1_test[,2],frequency = 7, start=c(15,3)) #Make ts object

The plots of the timeseries, ACF and PACF are observed:

ggtsdisplay(ny1_train) #Look at timeseries, ACF, PACF plots for NY1 Store

As mentioned earlier on, the mean and variance of the time series appear not to be constant (actually increasing over time) and consequently, indicative of multiplicative seasonal component. Consequently this appaears to be a candidate for the Box-Cox/logarithmic transformation.

ny1_train_transformed<-log(ny1_train)
autoplot(ny1_train_transformed) + labs(y="Log(Lettuce Demand for NY 1)")

The magnitude of the variations appear to be more stable after being transformed.

#How many differences need to be taken in order to stationarise
ndiffs(ny1_train_transformed)
## [1] 1
#How many seasonal differences need to be taken in order to stationarise
nsdiffs(ny1_train_transformed)
## [1] 0

Consequently, the timeseries is differenced as follows:

#How many differences need to be taken in order to stationarise
ny1_train_transformed_diff<-diff(ny1_train_transformed, differences = 1)

The stationarity tests are perfomed to ensure it is infact stationary.

adf.test(ny1_train_transformed_diff)
## Warning in adf.test(ny1_train_transformed_diff): p-value smaller than printed p-
## value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  ny1_train_transformed_diff
## Dickey-Fuller = -7.2516, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
pp.test(ny1_train_transformed_diff)
## Warning in pp.test(ny1_train_transformed_diff): p-value smaller than printed p-
## value
## 
##  Phillips-Perron Unit Root Test
## 
## data:  ny1_train_transformed_diff
## Dickey-Fuller Z(alpha) = -104.45, Truncation lag parameter = 3, p-value
## = 0.01
## alternative hypothesis: stationary
kpss.test(ny1_train_transformed_diff)
## Warning in kpss.test(ny1_train_transformed_diff): p-value greater than printed
## p-value
## 
##  KPSS Test for Level Stationarity
## 
## data:  ny1_train_transformed_diff
## KPSS Level = 0.023637, Truncation lag parameter = 3, p-value = 0.1

All three tests indicate that the timeseries has been stationarized.

3.32 Parameter Estimation

Possible models will be of the form ARIMA(p,1,q)(P,0,Q)[7]. In identify possible orders, the ACF and PACF plots are analysed.

#ACF
ny1_acf<-acf(ny1_train_transformed_diff, lag.max=21, plot=FALSE)
plot(ny1_acf, main= "NY 1 Store ACF")

#PACF
ny1_pacf<-pacf(ny1_train_transformed_diff, lag.max=21, plot=FALSE)
plot(ny1_pacf, main= "NY 1 Store PACF")

From the observation of the ACF plot, the significant spike at lag 1 and lag 7 are indicative of orders q and Q of <=1. Conversely, from the observation of the PACF plot, the lags appear to stop being significant after lag 6. Consequently, orders of p <=5 are expected.

The auto.arima function is then used in order to select the optimal orders based on the information criteria.

auto.arima(ny1_train, lambda = 0, trace = FALSE, approximation = FALSE, stepwise = FALSE, ic = c("aicc", "aic", "bic")) 
## Series: ny1_train 
## ARIMA(0,1,1)(2,0,0)[7] 
## Box Cox transformation: lambda= 0 
## 
## Coefficients:
##           ma1    sar1    sar2
##       -0.9190  0.2788  0.2046
## s.e.   0.0445  0.1127  0.1201
## 
## sigma^2 estimated as 0.02328:  log likelihood=37.44
## AIC=-66.88   AICc=-66.35   BIC=-57.3

The best model is given by ARIMA(0,1,1)(2,0,0)[7] and the second best model is given by ARIMA(0,1,1)(1,0,0)[7]. These two candidate models will be evaluated, in order to judge their out of sample performance.

Bias Adjustment: An issue that arises with the Box-Cox transformation is that the back transformed point forecast is usually the median as opposed to the mean. Considering that the fast food chain would most likely like to add up all the forecasted demand for each of the restaurants in order to plan lettuce delivery/inventory, the mean is preferred. This has been achieved by setting the biasadj argument in the Arima function to TRUE.

ny1.m1<-Arima(ny1_train, order = c(0,1,1), seasonal = list(order = c(2, 0, 0), period=7), lambda = 0, include.mean = FALSE, biasadj = TRUE)
ny1.m2<-Arima(ny1_train, order = c(0,1,1), seasonal = list(order = c(1, 0, 0), period=7), lambda = 0, include.mean = FALSE, biasadj = TRUE)

3.33 Verification

In Sample Performance

ny1_in_sample<-as.data.frame(accuracy(ny1.m1))
ny1_in_sample<-rbind(ny1_in_sample,accuracy(ny1.m2))
ny1_in_sample$Model<- c("Model 1", "Model 2")
ny1_in_sample<- ny1_in_sample %>% select(Model, everything())
row.names(ny1_in_sample) <- NULL
ny1_in_sample %>% kable(caption = "In Sample Performance") %>%  kable_styling("striped")
In Sample Performance
Model ME RMSE MAE MPE MAPE MASE ACF1
Model 1 2.514712 39.04974 29.64812 -1.0358261 11.50951 0.8337491 -0.0233680
Model 2 3.358640 39.82973 30.26919 -0.8005959 11.68300 0.8512146 -0.0043247

Model 1 performs the best on most of the metrics, especially on the RMSE. The models are subsequently evaluated on their out of sample performance.

Out of Sample Performance:

#Out of sample performance 
ny1_out_sample<-as.data.frame(accuracy(forecast(ny1.m1, h = 20), ny1_test))
ny1_out_sample<-rbind(ny1_out_sample,accuracy(forecast(ny1.m2, h = 20), ny1_test))
ny1_out_sample$Model<- c("Model 1","Model 1", "Model 2", "Model 2")
ny1_out_sample<- ny1_out_sample %>% select(Model, everything())
ny1_out_sample %>% kable(caption = "Out of Sample Performance") %>%  kable_styling("striped")
Out of Sample Performance
Model ME RMSE MAE MPE MAPE MASE ACF1 Theil’s U
Training set Model 1 2.514712 39.04974 29.64812 -1.0358261 11.50951 0.8337491 -0.0233680 NA
Test set Model 1 11.993375 48.73665 39.11604 1.5721260 13.42434 1.1000011 0.3046582 0.7664256
Training set1 Model 2 3.358640 39.82973 30.26919 -0.8005959 11.68300 0.8512146 -0.0043247 NA
Test set1 Model 2 10.998585 51.56242 41.72472 0.9421743 14.45553 1.1733611 0.3068557 0.8275730
#Plot Forecasts
ny1_m1_plot<-autoplot(forecast(ny1.m1, h = 20))
ny1_m2_plot<-autoplot(forecast(ny1.m2, h = 20))
grid.arrange(ny1_m1_plot, ny1_m2_plot, nrow = 1)

As expected, both models perform worse on the test set. Nonetheless, Model 1 out performs Model 2 on most performance metrics, especially the RMSE and is therefore selected.

Finally, the residuals from Model 1 are evaluated:

checkresiduals(ny1.m1)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,1)(2,0,0)[7]
## Q* = 7.3478, df = 11, p-value = 0.7703
## 
## Model df: 3.   Total lags used: 14

By all indications, model 1 provides a satisfactory fit for the data: the time series plot of the residuals displays white noise patterns, none of the spikes in the ACF plot are significant and the residuals are normally distributed with mean zero. Finally, this is supported by the large p value in the Ljung-Box test.

3.4 NY2 Store

3.41 Stationarize The Time Series

Train/Test Split: There are 88 observation in the NY2 dataset and they will be split 80/20 into the training-test set.

ny2_train<-ny2[1:70,] #Train on first 70 observations
ny2_train<-ts(ny2_train[,2],frequency = 7, start=c(03,20)) #Make ts object
ny2_test<-ny2[71:dim(ny2)[1],] #Test on from 71st to last
ny2_test<-ts(ny2_test[,2],frequency = 7, start=c(15,6)) #Make ts object

The plot of the timeseries, ACF and PACF are observed:

ggtsdisplay(ny2_train) #Look at timeseries, ACF, PACF plots for NY2 Store

The timeseries above appears to be trend stationary. However, this is confirmed by using the ndiffs function to see if any differences need to be taken.

#How many differences need to be taken in order to stationarise
ndiffs(ny2_train)
## [1] 0

No differences need to be taken.

#How many seasonal differences need to be taken in order to stationarise
nsdiffs(ny2_train)
## [1] 0

Consequently, the time series appears to stationary in terms of trend and seasonality, therefore models are expected to be of the sum ARIMA(p,0,q)(P,0,Q)[7]. The stationarity is confirmed by performing ADF, PP and KPSS tests.

adf.test(ny2_train) #ADF Test
## 
##  Augmented Dickey-Fuller Test
## 
## data:  ny2_train
## Dickey-Fuller = -3.4602, Lag order = 4, p-value = 0.05333
## alternative hypothesis: stationary
pp.test(ny2_train) #PP Test
## Warning in pp.test(ny2_train): p-value smaller than printed p-value
## 
##  Phillips-Perron Unit Root Test
## 
## data:  ny2_train
## Dickey-Fuller Z(alpha) = -47.388, Truncation lag parameter = 3, p-value
## = 0.01
## alternative hypothesis: stationary
kpss.test(ny2_train) #KPSS Test
## Warning in kpss.test(ny2_train): p-value greater than printed p-value
## 
##  KPSS Test for Level Stationarity
## 
## data:  ny2_train
## KPSS Level = 0.25303, Truncation lag parameter = 3, p-value = 0.1

The KPSS and PP test confirm the stationarity. Given that the the p-value of the ADF is only slightly above the 5% threshold, the above model is considered to be stationary.

3.42 Parameter Estimation

#ACF
ny2_acf<-acf(ny2_train, lag.max=21, plot=FALSE)
plot(ny1_acf, main= "NY 2 Store ACF")

#PACF
ny2_pacf<-pacf(ny2_train, lag.max=21, plot=FALSE)
plot(ny2_pacf, main= "NY 2 Store PACF")

Given the significant spikes in the ACF at the first lag and lag 7, are suggestive of q and Q<=1. In the PACF plot, the first spike at lag 1 is significant while that at lag 7 appears to just be at the boundary.

The auto.arima function is then used in order to select the optimal orders based on the information criteria.

auto.arima(ny2_train, trace = FALSE, approximation = FALSE, stepwise = FALSE, ic = c("aicc", "aic", "bic")) 
## Series: ny2_train 
## ARIMA(1,0,0)(1,0,0)[7] with non-zero mean 
## 
## Coefficients:
##          ar1    sar1      mean
##       0.2595  0.3230  218.4136
## s.e.  0.1167  0.1195   11.0479
## 
## sigma^2 estimated as 2443:  log likelihood=-371.24
## AIC=750.48   AICc=751.1   BIC=759.48

The best model is given by ARIMA(1,0,0)(1,0,0)[7] and the second best is given by ARIMA(0,0,1)(1,0,0)[7] with non-zero mean. The AR(1) component in the “best model” is explained by significant spike in the PACF at lag 1, while the seasonal-AR(1) is explained by the spike at lag 7 in the PACF that reaches the significance boundary.

#Coding the three candidate models
ny2.m1<-Arima(ny2_train, order = c(1,0,0), seasonal = list(order = c(1, 0, 0), period=7))
ny2.m2<-Arima(ny2_train, order = c(0,0,1), seasonal = list(order = c(1, 0, 0), period=7))

3.43 Verification

ny2_in_sample<-as.data.frame(accuracy(ny2.m1))
ny2_in_sample<-rbind(ny2_in_sample,accuracy(ny2.m2))
ny2_in_sample$Model<- c("Model 1", "Model 2")
ny2_in_sample<- ny2_in_sample %>% select(Model, everything())
row.names(ny2_in_sample) <- NULL
ny2_in_sample %>% kable(caption = "In Sample Performance") %>%  kable_styling("striped")
In Sample Performance
Model ME RMSE MAE MPE MAPE MASE ACF1
Model 1 1.304185 48.35115 38.73376 -18.79179 33.82831 0.8172227 -0.0171058
Model 2 1.312231 48.61274 38.78247 -18.94127 33.96957 0.8182503 0.0247216

As seen, Model 1 out performs Model 2 and the out of sample performance of both models is tested.

#Out of sample performance 
ny2_out_sample<-as.data.frame(accuracy(forecast(ny2.m1, h = 18), ny2_test))
ny2_out_sample<-rbind(ny2_out_sample,accuracy(forecast(ny2.m2, h = 18), ny2_test))
ny2_out_sample$Model<- c("Model 1","Model 1", "Model 2", "Model 2")
ny2_out_sample<- ny2_out_sample %>% select(Model, everything())
ny2_out_sample %>% kable(caption = "Out of Sample Performance") %>%  kable_styling("striped")
Out of Sample Performance
Model ME RMSE MAE MPE MAPE MASE ACF1 Theil’s U
Training set Model 1 1.304185 48.35115 38.73376 -18.791794 33.82831 0.8172227 -0.0171058 NA
Test set Model 1 -2.033126 49.24959 32.28410 -5.678179 15.07788 0.6811449 -0.0325667 0.7378221
Training set1 Model 2 1.312231 48.61274 38.78247 -18.941275 33.96957 0.8182503 0.0247216 NA
Test set1 Model 2 -1.826844 49.12370 32.22283 -5.550662 15.02731 0.6798520 -0.0368099 0.7350267

The performance on Model 1 and Model 2 are quite close on the out-of-sample test. However, Model 2 is selected over Model 1 due to its slightly better performance on the RMSE.

#Plot Forecasts
ny2_m1_plot<-autoplot(forecast(ny2.m1, h = 18))
ny2_m2_plot<-autoplot(forecast(ny2.m2, h = 18))
grid.arrange(ny2_m1_plot, ny2_m2_plot, nrow = 1)

Finally, the residuals from Model 2 are evaluated:

checkresiduals(ny2.m2)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,0,1)(1,0,0)[7] with non-zero mean
## Q* = 8.0122, df = 11, p-value = 0.7122
## 
## Model df: 3.   Total lags used: 14

Model 2 provides a satisfactory fit for the data (in spite of the long tail in the histogram): the time series plot of the residuals displays white noise patterns, none of the spikes in the ACF plot are significant and the residuals are normally distributed with mean zero. Finally, this is supported by the large p value in the Ljung-Box test.

4. Holt-Winters Models

In this portion of the analysis, the time series is modelled using the ets() function as opposed to the HoltWinters() function, due to the in-built optimisation of initial states and smoothing parameters via the maximisation of the likelihood function. Three steps are followed:

  1. Visually inspect the timeseries for any indication of trend/seasonality and if they are additive or multiplicative.

  2. Run the ets() function with the “model” argument set to “ZZZ” to determine the best model. The best model from this process is compared with the expected model from (1).

  3. Model evaluation on in and out-of-sample performance.

4.1 Cali 1 Store

4.11 Inspect The timeseries

The timeseries is decomposed into its component parts - level, trend, seasonal and errors. This is performed using Seasonal and Trend decomposition using Loess (STL). One particular advantage of using STL is the fact that it can handle a range of seasonalities - not just monthly/quarterly. Finally, the same train/test split defined above in the ARIMA section is used here.

cali1_stl<-cali1_train[,1]
stl(cali1_stl,s.window ="period") %>% autoplot 

In STL plots, the grey bars indicate the relative importance of each component. It can be seen that the seasonal component plays the most important part however, the large grey bar for the trend component indicates that its contribution to the original data/time series is marginal. Consequently, the ETS model expected is given by “A,N,A”, where the error type and season type are expected to be additive.

4.12 Determination of Best Model

cali1_ets<-ets(cali1_train, model="ZZZ") #Evaluate the best model
cali1_ets
## ETS(A,N,A) 
## 
## Call:
##  ets(y = cali1_train, model = "ZZZ") 
## 
##   Smoothing parameters:
##     alpha = 0.1223 
##     gamma = 1e-04 
## 
##   Initial states:
##     l = 144.2834 
##     s = 31.9515 15.5058 26.5879 24.4611 -70.2816 -46.5225
##            18.2978
## 
##   sigma:  24.6121
## 
##      AIC     AICc      BIC 
## 897.1487 900.2473 921.2159

As seen above, the “best model” is consistent with the expected “A,N,A”.

4.13 In/ Out-Of-Sample Performance

In/Out-Of-Sample Performance:

cali1_ets_out_sample<-as.data.frame(accuracy(forecast(cali1_ets, h = 20), cali1_test))
cali1_ets_out_sample %>% kable(caption = "In Sample Performance") %>%  kable_styling("striped")
In Sample Performance
ME RMSE MAE MPE MAPE MASE ACF1 Theil’s U
Training set -1.861661 23.22218 18.45667 -4.154372 14.30555 0.6970043 0.0562287 NA
Test set 19.416481 33.28287 25.92539 12.463420 18.22348 0.9790556 0.2157657 0.4540012

As expected, the model performance on the test set is worse than on the training set. In the next section, the performance of the Holt-Winters’ model is compared to the ARIMA.

#Plot Residuals
autoplot(forecast(cali1_ets, h = 20))

Check Residuals:

checkresiduals(cali1_ets)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,N,A)
## Q* = 13.685, df = 5, p-value = 0.01774
## 
## Model df: 9.   Total lags used: 14

From the results from the residual plots above appear to be inconclusive. While the time series does exhibit white noise-like behaviour, there is a slight downwards trajectory.The spikes are insignificant (apart from the spike at lag 11, attributable to tandom errors) and the distribution of the errors is normal with mean zero. However, the p-value of the Ljung-Box test is small enough for the null hypothesis to be rejected- an indication that the model shows a lack of fit.

The inconclusive nature of the residual analyses means that one should “approach with caution” with forecasts from this model.

4.2 Cali 2 Store

4.21 Inspect The Timeseries

The timeseries is decomposed into its component parts with STL, in order to inspect the relative importance of the trend, seasonal and remainder components.

cali2_stl<-cali2_train[,1]
stl(cali2_stl,s.window ="period") %>% autoplot 

From observing the relative size of the grey bars, it is seen that the trend component does not contribute much to the overall time series and so can be ignored. Consequently, the expected model is the ETS(“A,N,A”), with additive errors and seasonality. However, due to the the relative large size of the bars of the seasonal plot, the ETS(“A,N,N”) model is also explored for its out of sample performance.

4.22 Determination of Best Model

cali2_ets_1<-ets(cali2_train, model="ZZZ") #Evaluate the best model
cali2_ets_1
## ETS(A,A,A) 
## 
## Call:
##  ets(y = cali2_train, model = "ZZZ") 
## 
##   Smoothing parameters:
##     alpha = 0.0127 
##     beta  = 0.0125 
##     gamma = 5e-04 
## 
##   Initial states:
##     l = 315.9648 
##     b = 3.6247 
##     s = 8.1102 50.6401 57.8214 40.4994 18.3055 -80.5061
##            -94.8704
## 
##   sigma:  42.8103
## 
##      AIC     AICc      BIC 
## 912.2838 917.2362 940.2526

The best model according to the optimisation is given by ETS(A,A,A). However, it can be seen the smoothing parameter for the trend portion is 0.0125 and close to zero. Therefore, the ETS(“A,N,A”) and ETS(“A,N,N”) models are also explored in addition to this.

cali2_ets_2<-ets(cali2_train, model="ANA") #ANA Model
cali2_ets_3<-ets(cali2_train, model="ANN") #ANN Model

4.23 In/Out-of-Sample Performance

cali2_ets_out_sample<-as.data.frame(accuracy(forecast(cali2_ets_1, h = 19), cali2_test))
cali2_ets_out_sample<-rbind(cali2_ets_out_sample,accuracy(forecast(cali2_ets_2, h = 19), cali2_test))
cali2_ets_out_sample<-rbind(cali2_ets_out_sample,accuracy(forecast(cali2_ets_3, h = 19), cali2_test))
cali2_ets_out_sample$Model<- c("Model 1","Model 1", "Model 2", "Model 2", "Model 3", "Model 3")
cali2_ets_out_sample<- cali2_ets_out_sample %>% select(Model, everything())
cali2_ets_out_sample %>% kable(caption = "Out of Sample Performance") %>%  kable_styling("striped")
Out of Sample Performance
Model ME RMSE MAE MPE MAPE MASE ACF1 Theil’s U
Training set Model 1 -8.8667541 39.59116 30.57049 -4.868276 10.79776 0.7133458 -0.0497177 NA
Test set Model 1 89.7060900 101.04021 89.98506 29.590915 29.69424 2.0997529 -0.0804477 1.0880721
Training set1 Model 2 -3.9605391 41.26204 31.63586 -3.438702 10.98139 0.7382057 -0.0893617 NA
Test set1 Model 2 34.0494478 55.95950 44.43286 10.551570 14.25775 1.0368169 -0.0241791 0.6328567
Training set2 Model 3 -2.6510006 76.18909 62.80163 -7.544948 22.51971 1.4654421 0.3718191 NA
Test set2 Model 3 -0.1610933 65.95330 56.86687 -4.928026 19.78695 1.3269577 0.1353543 0.7255181

While Model 1 performs the best on the training sample, the out of sample/test RMSE is actually lowest for Model 2. This indicates that Model 1 is likely to have been overfitting the test data and for this reason, Model 2 is selected (given by ANA).

#Plot Forecast
autoplot(forecast(cali2_ets_2, h = 19))

checkresiduals(cali2_ets_2)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,N,A)
## Q* = 12.858, df = 5, p-value = 0.02475
## 
## Model df: 9.   Total lags used: 14

Although the p-value of the Ljung-Box test indicates that the null hypothesis can be reject, all visual inspections of the timeseries, spikes and normal distribution point to suitability of the model.

4.3 NY 1 Store

4.31 Inspect The Timeseries

STL is used in order to decompose the timeseries.

ny1_stl<-ny1_train[,1]
stl(ny1_stl,s.window ="period") %>% autoplot 

It is can be seen that due to growing variance in the data and growing error terms that a multiplicative model is more appropiate. Consequently, potential models to test include “M,A,M”, “M,N,M” and “M,A,N” as the grey bars for the trend and seasonal components are relatively large.

ny1_ets_1<-ets(ny1_train, model="ZZZ")
ny1_ets_1
## ETS(M,N,M) 
## 
## Call:
##  ets(y = ny1_train, model = "ZZZ") 
## 
##   Smoothing parameters:
##     alpha = 0.1549 
##     gamma = 1e-04 
## 
##   Initial states:
##     l = 240.4624 
##     s = 1.0391 0.9567 0.9337 1.0025 0.8566 1.1237
##            1.0877
## 
##   sigma:  0.1438
## 
##      AIC     AICc      BIC 
## 962.0482 965.1467 986.1153

The model from the optimisation is given by the “M,N,M” model. Consequently, the “M,A,M” and “M,A,N” are taken as the other two candidate models to be tested for their out of sample performance.

4.32 Determination of Best Model

ny1_ets_2<-ets(ny1_train, model="MAM")
ny1_ets_3<-ets(ny1_train, model="MAN")
ny1_ets_2
## ETS(M,Ad,M) 
## 
## Call:
##  ets(y = ny1_train, model = "MAM") 
## 
##   Smoothing parameters:
##     alpha = 1e-04 
##     beta  = 1e-04 
##     gamma = 1e-04 
##     phi   = 0.9762 
## 
##   Initial states:
##     l = 215.1624 
##     b = 1.9084 
##     s = 1.0242 0.9686 0.9256 1.0231 0.8399 1.134
##            1.0847
## 
##   sigma:  0.1381
## 
##      AIC     AICc      BIC 
## 960.2097 965.5627 991.4971
ny1_ets_3
## ETS(M,A,N) 
## 
## Call:
##  ets(y = ny1_train, model = "MAN") 
## 
##   Smoothing parameters:
##     alpha = 1e-04 
##     beta  = 1e-04 
## 
##   Initial states:
##     l = 221.016 
##     b = 0.8982 
## 
##   sigma:  0.1624
## 
##      AIC     AICc      BIC 
## 979.4819 980.2713 991.5155

These three models are tested for their out-of-sample performance.

4.33 In/Out-of-Sample Performance

ny1_ets_out_sample<-as.data.frame(accuracy(forecast(ny1_ets_1, h = 20), ny1_test))
ny1_ets_out_sample<-rbind(ny1_ets_out_sample,accuracy(forecast(ny1_ets_2, h = 20), ny1_test))
ny1_ets_out_sample<-rbind(ny1_ets_out_sample,accuracy(forecast(ny1_ets_3, h = 20), ny1_test))
ny1_ets_out_sample$Model<- c("Model 1","Model 1", "Model 2", "Model 2", "Model 3", "Model 3")
ny1_ets_out_sample<- ny1_ets_out_sample %>% select(Model, everything())
ny1_ets_out_sample %>% kable(caption = "Out of Sample Performance") %>%  kable_styling("striped")
Out of Sample Performance
Model ME RMSE MAE MPE MAPE MASE ACF1 Theil’s U
Training set Model 1 2.0251110 35.05182 26.63933 -0.8240822 10.287605 0.7491376 -0.0429736 NA
Test set Model 1 8.6631279 44.59696 36.87301 0.7187325 12.946867 1.0369237 0.2157562 0.7266446
Training set1 Model 2 -1.3569147 33.63185 25.40108 -2.1492265 9.881108 0.7143162 0.0231868 NA
Test set1 Model 2 -10.5509412 45.39093 39.71499 -6.4144413 14.996952 1.1168445 0.2036839 0.7652000
Training set2 Model 3 -0.1980519 41.82318 33.31233 -2.5115689 13.036344 0.9367922 0.0720625 NA
Test set2 Model 3 -29.4331996 60.69639 56.00368 -14.4006278 21.732654 1.5749067 0.3355449 1.0179866
#Plot Forecast
autoplot(forecast(ny1_ets_1, h = 20))

Model 1 is therefore selected since it gives the best fit on RMSE for the test set.

checkresiduals(ny1_ets_1)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(M,N,M)
## Q* = 10.696, df = 5, p-value = 0.05775
## 
## Model df: 9.   Total lags used: 14

Model 1 also passes on all residual diagnostics, providing more confidence in this model.

4.4 NY 2 Store

4.41 Inspect The Timeseries

As in previous cases, the timeseries is decomposed in order to inspect the seasonal, trend and error components.

ny2_stl<-ny2_train[,1]
stl(ny2_stl,s.window ="period") %>% autoplot 

From the plots above, it is seen that the components are likely to be additive. However the relatively large grey bar for the trend component indicates that it does not contirbute greatly to the variations in the timeseries. Consequently, potential models to test include “A,A,A”, “A,N,A”, “A,A,N” and “A,N,N”, as the grey bar for the seasonal component also appears to be quite large.

4.42 Determination of Best Model

ny2_ets_1<-ets(ny2_train, model="ZZZ")
ny2_ets_1
## ETS(A,N,A) 
## 
## Call:
##  ets(y = ny2_train, model = "ZZZ") 
## 
##   Smoothing parameters:
##     alpha = 0.1895 
##     gamma = 1e-04 
## 
##   Initial states:
##     l = 215.2573 
##     s = 13.4722 30.6874 15.4489 18.7483 6.1767 -69.3617
##            -15.1719
## 
##   sigma:  44.4909
## 
##      AIC     AICc      BIC 
## 839.1011 842.8299 861.5861

The optimisation provides A,N,A as the best model. Therefore, the other three are considered as candidate models.

#Candidate Models
ny2_ets_2<-ets(ny2_train, model="AAA")
ny2_ets_3<-ets(ny2_train, model="AAN")
ny2_ets_4<-ets(ny2_train, model="ANN")

4.43 In/Out-of-Sample Performance

ny2_ets_out_sample<-as.data.frame(accuracy(forecast(ny2_ets_1, h = 18), ny2_test))
ny2_ets_out_sample<-rbind(ny2_ets_out_sample,accuracy(forecast(ny2_ets_2, h = 18), ny2_test))
ny2_ets_out_sample<-rbind(ny2_ets_out_sample,accuracy(forecast(ny2_ets_3, h = 18), ny2_test))
ny2_ets_out_sample<-rbind(ny2_ets_out_sample,accuracy(forecast(ny2_ets_4, h = 18), ny2_test))
ny2_ets_out_sample$Model<- c("Model 1","Model 1", "Model 2", "Model 2", "Model 3", "Model 3", "Model 4", "Model 4")
ny2_ets_out_sample<- ny2_ets_out_sample %>% select(Model, everything())
ny2_ets_out_sample %>% kable(caption = "Out of Sample Performance") %>%  kable_styling("striped")
Out of Sample Performance
Model ME RMSE MAE MPE MAPE MASE ACF1 Theil’s U
Training set Model 1 -0.5314202 41.53243 34.24962 -13.4603626 26.57567 0.7226142 0.0831474 NA
Test set Model 1 7.9390062 47.20564 29.52803 0.6841513 12.93265 0.6229960 -0.1930890 0.7559118
Training set1 Model 2 -0.2728030 41.19026 33.94867 -13.5247902 26.65282 0.7162647 0.0719375 NA
Test set1 Model 2 1.7881146 46.25062 30.78688 -2.4625522 13.88885 0.6495557 -0.2030599 0.7424033
Training set2 Model 3 -9.8778535 52.40140 38.23240 -27.7974478 38.46749 0.8066447 0.2219707 NA
Test set2 Model 3 63.3531198 87.00047 67.75642 25.3560064 28.79608 1.4295560 0.2200590 1.3949020
Training set3 Model 4 2.9188311 52.59888 40.41706 -19.5549004 36.00171 0.8527376 0.1735691 NA
Test set3 Model 4 -1.3356800 54.34285 37.87089 -6.2794817 18.16734 0.7990175 0.0952819 0.8511216

Given that Model 2 performs the best on the on test RMSE and is therefore selected.

#Plot Forecast
autoplot(forecast(ny2_ets_2, h = 18))

checkresiduals(ny2_ets_2)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,A,A)
## Q* = 12.145, df = 3, p-value = 0.006904
## 
## Model df: 11.   Total lags used: 14

The residuals generally exhibit a white noise pattern, with lags that are insignificant. However, the distribution appears to be slightly skewed and the Ljung-Box test returns a significant p-value. Consequently, these tests are inconclusive.

5. Performance comparison between Holt-Winters and ARIMA: Summary of All Models and Overall Model Selection

Summary of best performing models:

Firstly, a summary of the best of the ARIMA and Holt-Winters models for each store is provided below:

  1. Cali1 Store (46673): The best of the ARIMA Models is ARIMA(1,0,0)(0,1,1)[7]. The best of the Holt-Winters models is ETS(A,N,A).

  2. Cali2 Store (4904): The best of the ARIMA Models is ARIMA(0,1,1)(0,1,1)[7]. The best of the Holt-Winters models is ETS(A,N,A).

  3. NY1 Store (12631): The best of the ARIMA Models is ARIMA (0,1,1)(2,0,0)[7]. The best of the Holt-Winters models is ETS(M,N,M).

  4. NY2 Store (20974): The best of the ARIMA Models is ARIMA (0,0,1)(1,0,0)[7] with non-zero mean. The best of the Holt-Winters models is ETS(A,A,A).

Comparison between Holt-Winters and ARIMA Models : While the various models were fit on a range of Information Criteria (AIC, AICc, BIC), they only include training data and provide no evidence on how well the models are able to generalise. Furthermore, as the likelihood is calculated differently for ARIMA and ETS models, they cannot be compared. Consequently, the in and out of sample perfomance metrics calculated for the models in earlier sections is used for the final selection - specifically, RMSE. Ideally, the best model will provide the lowest RMSE but also a consistent performance between the training and test sample - indicating that it is able to generalise well on unseen data (a compromise between underfitting and overfitting).

  1. Cali1 Store: Below is the comparison of the ETS(A,N,A) and ARIMA(1,0,0)(0,1,1)[7] models.
cali1_overall_fit<-cali1_ets_out_sample #Create a copy to preserve original dataframe
cali1_overall_fit$Model<-c("Holt-Winters","Holt-Winters")
cali1_overall_fit<-cali1_overall_fit %>% select(Model, everything()) #Reorder
#Merge with ARIMA out of sample
cali1_overall_fit<-rbind(cali1_overall_fit,cali1_out_sample[cali1_out_sample$Model=="Model 1",]) 
cali1_overall_fit[3:4,1]<-c("ARIMA","ARIMA")
cali1_overall_fit %>% kable(caption = "Performance Comparison: Holt-Winters VS ARIMA") %>%  kable_styling("striped") 
Performance Comparison: Holt-Winters VS ARIMA
Model ME RMSE MAE MPE MAPE MASE ACF1 Theil’s U
Training set Holt-Winters -1.8616608 23.22218 18.45667 -4.154372 14.30555 0.6970043 0.0562287 NA
Test set Holt-Winters 19.4164807 33.28287 25.92539 12.463420 18.22348 0.9790556 0.2157657 0.4540012
Training set1 ARIMA -0.7949027 24.95910 18.28022 -2.980306 13.47389 0.6903407 0.0052265 NA
Test set1 ARIMA 7.6579755 32.69719 23.53935 1.828789 17.15170 0.8889481 0.1942585 0.4717754

On RMSE, the perfomance between the ARIMA model and Holt-Winters are comparable, including the perfomance of the spread of errors on the training and test set. However, given the “approach with caution” warning for the Holt-Winters model in section 4.13 and the slightly lower RMSE for the ARIMA, ARIMA(1,0,0)(0,1,1)[7] is selected as the final model.

  1. Cali2 Store:

Below is a comparison of the ETS(A,N,A) and ARIMA(0,1,1)(0,1,1)[7] Models.

cali2_overall_fit<-cali2_ets_out_sample[cali2_ets_out_sample$Model=="Model 2",] #Extract Relevant Rows
cali2_overall_fit$Model<-c("Holt-Winters","Holt-Winters") #Rename Rows
#Combine with ARIMA
cali2_overall_fit<-rbind(cali2_overall_fit,cali2_out_sample[cali2_out_sample$Model=="Model 1",]) 
cali2_overall_fit[3:4,1]<-c("ARIMA","ARIMA")
cali2_overall_fit %>% kable(caption = "Performance Comparison: Holt-Winters VS ARIMA") %>%  kable_styling("striped") 
Performance Comparison: Holt-Winters VS ARIMA
Model ME RMSE MAE MPE MAPE MASE ACF1 Theil’s U
Training set1 Holt-Winters -3.960539 41.26204 31.63586 -3.438702 10.98139 0.7382057 -0.0893617 NA
Test set1 Holt-Winters 34.049448 55.95950 44.43286 10.551570 14.25775 1.0368169 -0.0241791 0.6328567
Training set ARIMA -7.638800 44.18003 31.90097 -3.637262 10.72865 0.7443920 -0.1074127 NA
Test set ARIMA 71.572809 94.50028 74.24301 21.922535 23.10160 1.7324206 -0.3237485 1.0594637

In this case, the Holt-Winters model given by ETS(A,N,A) gives a clearly superior performance on the in sample and out-of sample sets. Therefore, ETS(A,N,A) is selected as the final model.

  1. NY1 Store:

Below is a comparison of the ETS(M,N,M) and ARIMA (0,1,1)(2,0,0)[7] Models.

ny1_overall_fit<-ny1_ets_out_sample[ny1_ets_out_sample$Model=="Model 1",] #Extract Relevant Rows
ny1_overall_fit$Model<-c("Holt-Winters","Holt-Winters") #Rename Rows
#Combine with ARIMA
ny1_overall_fit<-rbind(ny1_overall_fit,ny1_out_sample[ny1_out_sample$Model=="Model 1",]) 
ny1_overall_fit[3:4,1]<-c("ARIMA","ARIMA")
ny1_overall_fit %>% kable(caption = "Performance Comparison: Holt-Winters VS ARIMA") %>%  kable_styling("striped") 
Performance Comparison: Holt-Winters VS ARIMA
Model ME RMSE MAE MPE MAPE MASE ACF1 Theil’s U
Training set Holt-Winters 2.025111 35.05182 26.63933 -0.8240822 10.28761 0.7491376 -0.0429736 NA
Test set Holt-Winters 8.663128 44.59696 36.87301 0.7187325 12.94687 1.0369237 0.2157562 0.7266446
Training set1 ARIMA 2.514712 39.04974 29.64812 -1.0358261 11.50951 0.8337491 -0.0233680 NA
Test set1 ARIMA 11.993375 48.73665 39.11604 1.5721260 13.42434 1.1000011 0.3046582 0.7664256

While the performance is comprable for the ARIMA and Holt-Winters models, the Holt-Winters model shows superior performance on the test set RMSE. Therefore, ETS(M,N,M) is selected as the final model.

  1. NY2 Store:

Below is a comparison of the ETS(A,A,A) and ARIMA (0,0,1)(1,0,0)[7] Models.

ny2_overall_fit<-ny2_ets_out_sample[ny2_ets_out_sample$Model=="Model 2",] #Extract Relevant Rows
ny2_overall_fit$Model<-c("Holt-Winters","Holt-Winters") #Rename Rows
#Combine with ARIMA
ny2_overall_fit<-rbind(ny2_overall_fit,ny2_out_sample[ny2_out_sample$Model=="Model 2",]) 
ny2_overall_fit[3:4,1]<-c("ARIMA","ARIMA")
ny2_overall_fit %>% kable(caption = "Performance Comparison: Holt-Winters VS ARIMA") %>%  kable_styling("striped") 
Performance Comparison: Holt-Winters VS ARIMA
Model ME RMSE MAE MPE MAPE MASE ACF1 Theil’s U
Training set1 Holt-Winters -0.272803 41.19026 33.94867 -13.524790 26.65282 0.7162647 0.0719375 NA
Test set1 Holt-Winters 1.788115 46.25062 30.78688 -2.462552 13.88885 0.6495557 -0.2030599 0.7424033
Training set11 ARIMA 1.312231 48.61274 38.78247 -18.941275 33.96957 0.8182503 0.0247216 NA
Test set11 ARIMA -1.826844 49.12370 32.22283 -5.550662 15.02731 0.6798520 -0.0368099 0.7350267

While the Holt-Winter model provides the lower RMSE on the test sample, the ARIMA model exhibits a narrower spread between the training and test sets. Since there is only a small difference in the RMSE on the test sets and the ARIMA model provides evidence of being able to generalise better, it is preferred. Therefore, ARIMA (0,0,1)(1,0,0)[7] is selected as the final model.

6. Final Forecast for 06/16/2015 to 06/29/2015 Window

The final models are then trained on the full datasets in order to make a forecast for the 06/16/2015 to 06/29/2015 window. The results will be presented in a separate CSV file.

#Make the original, full datasets ts objects
cali1_final<-ts(cali1[,2], frequency=7, start=c(03,05))
cali2_final<-ts(cali2[,2], frequency=7, start=c(03,13))
ny1_final<-ts(ny1[,2], frequency = 7, start=c(03,05))
ny2_final<-ts(ny2[,2], frequency = 7, start=c(03,20))
#Train final models on the full data set

#Cali1 Final Model: ARIMA(1,0,0)(0,1,1)[7]
cali1.mfinal<-Arima(cali1_final, order = c(1,0,0), seasonal = list(order = c(0, 1, 1), period=7))

#Cali2 Final Model: ETS(A,N,A)
cali2.mfinal<-ets(cali2_final, model="ANA")

#NY1 Final Model: ETS(M,N,M)
ny1.mfinal<-ets(ny1_final, model="MNM")

#NY2 Final Model: ARIMA(0,0,1)(1,0,0)[7]
ny2.mfinal<-Arima(ny2_final, order = c(0,0,1), seasonal = list(order = c(1, 0, 0), period=7))
#Forecast for 14-day window
cali1_forecast<-forecast(cali1.mfinal,h=14)
cali2_forecast<-forecast(cali2.mfinal,h=14)
ny1_forecast<-forecast(ny1.mfinal,h=14)
ny2_forecast<-forecast(ny2.mfinal,h=14)
#Report Point Forecasts in Table

#Include Dates
Dates<-as.character(seq(ymd('2015-06-16'),ymd('2015-06-29'), by = 'days'))

#Extract Mean/Point Forecast

forecast_cali1<-cali1_forecast$mean 
forecast_cali2<-cali2_forecast$mean
forecast_ny1<-ny1_forecast$mean
forecast_ny2<-ny2_forecast$mean
final_prediction<-cbind(Dates,forecast_cali1,forecast_cali2,forecast_ny1,forecast_ny2)

#Rename Column Names
colnames(final_prediction)<-c("Dates","Store 46673", "Store 4904", "Store 12631", "Store 20974") 
#Write CSV
final_prediction<-as.data.frame(final_prediction)
final_prediction %>% kable(caption = "Final Prediction") %>%  kable_styling("striped")
Final Prediction
Dates Store 46673 Store 4904 Store 12631 Store 20974
2015-06-16 172.065673289277 359.29446057546 258.196528985547 223.711585280169
2015-06-17 175.158697394708 347.835410264577 276.195535773721 233.026989668809
2015-06-18 164.803362692132 308.86599554549 302.647535094403 282.886883570584
2015-06-19 100.789620802174 212.264763379943 307.136078143092 200.042752164558
2015-06-20 76.6759675777962 212.951138432589 233.225435635271 213.850107398896
2015-06-21 168.66757546734 336.596678298125 266.344536763498 198.892139228364
2015-06-22 176.529685366448 336.878467050087 246.864054242653 236.862366122792
2015-06-23 160.781118446457 359.29446057546 258.196587683542 220.643167113522
2015-06-24 173.271331392456 347.835410264577 276.195598563582 224.215975378674
2015-06-25 164.487696679953 308.86599554549 302.64760389782 243.339121685568
2015-06-26 100.73682498455 212.264763379943 307.136147966927 211.565278591037
2015-06-27 76.6671373640542 212.951138432589 233.225488656377 216.860919106792
2015-06-28 168.666098595151 336.596678298125 266.344597313848 211.123975214724
2015-06-29 176.529438356407 336.878467050087 246.864110364341 225.686986633051
write.csv(final_prediction, "FinalPrediction.csv")